Skip to content

[ENH][wal3] CPU parts of garbage collection #4617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

rescrv
Copy link
Contributor

@rescrv rescrv commented May 23, 2025

Description of changes

This PR lays out the basics of garbage collection and the algorithm to use.

What remains to be done:

  • wire up the I/O pieces to pull in the right metadata for gc.
  • wire up the delete path after the CPU path culls data from a manifest.

Test plan

CI

  • Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Documentation Changes

  • Up-to-date rust/wal3/README.md

Copy link

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@rescrv rescrv changed the title [ENH][wal3] Document the GC structures and add the base structs. [ENH][wal3] CPU parts of garbage collection May 23, 2025
@rescrv rescrv marked this pull request as ready for review May 23, 2025 23:23
Copy link
Contributor

propel-code-bot bot commented May 23, 2025

Initial Implementation of Garbage Collection Algorithm in wal3 (CPU-side, Non-IO Path)

This PR introduces the core CPU-side algorithm and data structures for garbage collection (GC) in the Rust wal3 write-ahead log (WAL) library. It defines the Garbage and GarbageAction structures to model what data to collect, implements the CPU logic to compute GC effects on the in-memory manifest/snapshot data, adds tracking for collected setsums, provides methods to verify and apply garbage actions to manifests, and covers the full GC lifecycle with extensive property-based testing. The I/O integration and actual delete operations on persisted data are intentionally left as TODOs for future follow-up.

Key Changes:
• Introduced new gc.rs module with Garbage and GarbageAction types representing GC plans and actions.
• Implemented logic to compute which manifest fragments and snapshots are eligible for collection given a cutoff point, and the recursive setsum tracking for correctness.
• Added apply_garbage, has_collected_garbage, and utility methods to Manifest for applying and verifying the effects of GC.
• Extended the Manifest and Fragment types to track collected setsums for verification.
• Integrated property-based tests (rust/wal3/tests/properties.rs) for the GC logic, covering both fragments and snapshots.
• Updated documentation (README.md) to describe the GC file format, invariants, process, error handling, and performance considerations.
• Added missing workspace dev-deps for property testing (proptest, rand).
• Updated various call-sites and test code to handle new fields (e.g., collected setsum) and new GC logic.

Affected Areas:
• rust/wal3/src/gc.rs
• rust/wal3/src/manifest.rs
• rust/wal3/src/lib.rs
• rust/wal3/tests/properties.rs
• rust/wal3/README.md
• rust/wal3/src/manifest_manager.rs
• rust/wal3/src/copy.rs
• Cargo.lock
• rust/wal3/Cargo.toml
• test code and integration points for log-service

Potential Impact:

Functionality: Adds foundational, rigorously tested manifest/snapshot GC algorithmic support for the WAL, but does not yet delete files from object storage. Manifests now track collected garbage and enable end-to-end garbage collection state transitions and verification.

Performance: Limited immediate impact; actual I/O batching/deletion and any performance changes will come when I/O is wired in. CPU-only logic is property tested and should be efficient.

Security: No sensitive changes, but integrity checks and error handling are enhanced-setsum mismatches and partial collection are detected and explicitly handled.

Scalability: Prepares codebase for efficient, incremental GC even on large logs by supporting manifest-level tracking, batched GC, and partial manifest application.

Review Focus:
• Correctness of the recursive setsum tracking in Garbage and GarbageAction.
• Edge cases in the manifest-snapshot-fragment tree traversal and cutoff logic.
• Property-based test coverage for possible failure scenarios and correctness.
• Backward compatibility for manifest formats and optional collected field.
• Propagation and usage of new collected setsum field in existing code.

Testing Needed

• Run the new property-based tests (cargo test) to verify GC logic for both fragments and snapshots.
• Check application of Garbage to manifests with edge cases (cutoff at boundaries, overlapping snapshots/fragments, etc.).
• Verify scrub and setsum invariants hold after GC application.
• Manual testing can confirm the new collected setsum field is maintained correctly in manifests.

Code Quality Assessment

rust/wal3/src/gc.rs: Well-structured, idiomatic Rust; explicitly handles setsum integrity, recursion, and error reporting. No critical dead code or TODOs in the final version.

rust/wal3/src/manifest.rs: Cleanly extended to support new GC-related fields and logic without breaking compatibility.

rust/wal3/tests/properties.rs: Robust usage of proptest with clear scenarios testing the GC lifecycle and setsum correctness.

rust/wal3/README.md: Updated with accurate, detailed documentation on GC mechanics and contract.

Best Practices

State Integrity:
• Recursive validation of tree-structured data
• Defensive invariants in manifest application

Documentation:
README and inline docs fully updated to match implementation

Serialization:
• Serde usage for backward/forward compatibility

Testing:
• Property-based testing for invariants
• Extensive unit tests on core logic

Error Handling:
• Explicit error cases for setsum mismatches
• Defensive checks in destructive operations

Potential Issues

• Future IO integration is outstanding-actual removal from object storage is not yet implemented.
• Edge-case logic is complex; any missed test scenarios could allow subtle setsum mismatches. Reviewers should check recursive replace/drop edge cases closely.
• The new collected field defaults to zero for older data; there may be BC breakpoints for software expecting a manifest without this field.
• Complex state transitions in apply_garbage; interactions with concurrent writers/readers should be checked in implementation phase.

This summary was automatically generated by @propel-code-bot

@rescrv rescrv requested a review from sanketkedia May 23, 2025 23:32
&mut Setsum::default(),
)?);
}
todo!();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[CriticalError]

There's an unimplemented function in gc.rs that will cause runtime failures. The drop_snapshot method on line 105 ends with a todo!() macro, which will panic when called. This needs to be implemented before this code can be safely merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant